Investigating the effect of textural properties on CO2 adsorption in porous carbons via deep neural networks using various training algorithms

The adsorption of carbon dioxide (CO2) on porous carbon materials offers a promising avenue for cost-effective CO2 emissions mitigation. This study investigates the impact of textural properties, particularly micropores, on CO2 adsorption capacity. Multilayer perceptron (MLP) neural networks were employed and trained with various algorithms to simulate CO2 adsorption. Study findings reveal that the Levenberg–Marquardt (LM) algorithm excels with a remarkable mean squared error (MSE) of 2.6293E−5, indicating its superior accuracy. Efficiency analysis demonstrates that the scaled conjugate gradient (SCG) algorithm boasts the shortest runtime, while the Broyden–Fletcher–Goldfarb–Shanno (BFGS) algorithm requires the longest. The LM algorithm also converges with the fewest epochs, highlighting its efficiency. Furthermore, optimization identifies an optimal radial basis function (RBF) network configuration with nine neurons in the hidden layer and an MSE of 9.840E−5. Evaluation with new data points shows that the MLP network using the LM and bayesian regularization (BR) algorithms achieves the highest accuracy. This research underscores the potential of MLP deep neural networks with the LM and BR training algorithms for process simulation and provides insights into the pressure-dependent behavior of CO2 adsorption. These findings contribute to our understanding of CO2 adsorption processes and offer valuable insights for predicting gas adsorption behavior, especially in scenarios where micropores dominate at lower pressures and mesopores at higher pressures.

The amount of CO 2 adsorbed V micro Micropore volume V meso

Mesopore volume w i
Set of variables that need to be updated ∇f (w i ) Gradient of the loss function f y 0

Primary training direction vector c i
Conjugate parameter H Hessian matrix estimation J Jacobian matrix e Vector of network errors g Gradient F Objective function E D

Mean squared error function E W
Weight attenuation function www.nature.com/scientificreports/ability to produce a reasonable estimate, the regression polynomial method makes it challenging to obtain an ideal empirical formula.In addition, this regression equation cannot be learned and developed as the database grows 46 .Importantly, according to Eq. (1), ordered mesoporous carbon without micropores ought to have almost no capacity to adsorb CO 2 , which is preposterous in some instances.In addition, the equation is derived from a small sample size, with only 12 varieties of porous carbon examined in their research.All of these samples were at a pressure of 5 bar and a temperature of 25 °C, so the operating conditions of the adsorbers that use this relationship are restricted to these conditions.
So far, quantitative investigations utilizing the known textural parameters of porous carbons have successfully enabled precise forecasts of CO 2 adsorption.Shen et al. 47 developed hierarchical porous activated carbon fibers, which exhibit faster adsorption rates and higher capacity when compared to pure carbon materials.Xia et al. 's 48 study shows high surface area, microporous, nitrogenous properties, and zeolite pattern carbons have the highest CO 2 adsorption capacity.Casco et al. 49 observed that activated carbons derived from crude oil and potassium hydroxide display exceptional CO 2 adsorption performance in atmospheric and high-pressure conditions.The optimal carbon structure is contingent upon the specific application, with narrow micropores exerting control and ensuring high delivery capacity.Based on the review of existing literature, it is notable that there is a lack of a comprehensive and highly accurate model that can effectively predict the amount of carbon dioxide adsorbed by the textural properties of adsorbents and the operating conditions.Deep learning (DL) models are frequently regarded as a robust approach for modeling purposes, owing to their capacity to yield highly accurate predictions.However, it is notable that these models cannot offer a quantitative analysis of each input, as they operate based on the concept of black boxes.
In recent years, deep learning has exhibited significant potential in addressing a multitude of material research-related challenges [50][51][52][53] .Specifically, research endeavors have been undertaken within the domain of neural network modeling and simulation to study carbon dioxide adsorption processes.Dashti et al. 54 created an MLP network to assess the adsorption of pure gases on activated carbon and zeolite-5A.They developed accurate models using input parameters such as temperature, pressure, pore size, and kinetic diameter.To optimize the model, various hidden layers were constructed, and the dataset's AARD% value was used to evaluate performance.Fotoohi et al. 55 used four two-dimensional equations of state to assess the adsorption of pure and binary gases onto activated carbons.They applied the LM algorithm for model learning and the ANN method for prediction.Compared to two-dimensional state equations, the optimal architecture was 1-6-7 for pure gas adsorption and 1-7-9 for binary gas adsorption, with a higher precision.Iraji et al. 56 surveyed the adsorption of CO 2 and SO 2 on modified carbon nanotubes.They proposed an MLP neural network model with three hidden layers and 10 neurons, which was trained with the LM training algorithm.Lepri et al. 57 developed reduced-order PSA models using artificial neural networks that demonstrated excellent agreement between ANN and simulation results, allowing for their implementation in optimization environments for PSA cycle synthesis.Meng et al. 58 explored the adsorption of supercritical CO 2 and CH 4 on coal using conventional isotherm models.In addition, they proposed a novel machine learning model based on a 15-neuron neural network.Zhang et al. 44 predicted the CO 2 adsorption capacity of porous carbons using the DL algorithm with five input parameters: specific surface area, micropore volume, mesopore volume, temperature, and pressure.Also, the highest prediction accuracy was achieved when all three textural properties were used.At low pressure, microporous porous carbon adsorbs more CO 2 , whereas hierarchical porous carbon adsorbs more at high pressure.However, the DNN model presented in this research was prepared by MATLAB toolbox, which due to its limitations, such as creating a model with only two hidden layers with an equal number of neurons, the impossibility of changing the default activation functions for the layers cannot be an optimal model.Table 1 presents a summary of further research on the ANN models for simulating CO 2 adsorption processes.It can be seen that the diversity of training algorithms applied to train MLP neural networks in CO 2 adsorption (1)  processes is comparatively limited when compared to other processes, with Levenberg-Marquardt (LM) and Bayesian regularization (BR) algorithms being the primary choices.
To fill the aforementioned gaps, MLP neural network and, for the first time, RBF were used to model and simulate 1345 collected data representing more than 200 distinct porous carbon adsorbents to predict the amount of CO 2 adsorbed.Inputs for these models included adsorbent textural properties such as BET surface area, mesopore volume, micropore volume, and temperature and pressure as operating conditions.According to the literature, the impact of textural properties and adsorption conditions on CO 2 adsorption are not independent; therefore, the Pearson correlation coefficient analysis was used to investigate the primary linear relationships between any two variables as a preliminary step.In the MLP neural network, 13 distinct training algorithms and four activation function combinations of hidden layers were applied to each algorithm to optimize the network.The accuracy, run duration, and number of epochs are then used as criteria for comparing the models in order to choose the most optimal MLP model.The present study conducted a thorough evaluation of the factors that influence CO 2 adsorption, and it also highlighted the gap in prior research by addressing the lack of relevant analysis in previous studies.This evaluation was based on the results obtained from a simulation, highlighting the significance of understanding the impact of numerous factors on the adsorption process.

Data gathering and preparation
More than 150 papers were screened to compile data for this study.The data used for modeling and simulation was selected from literature containing over 200 adsorbents operating at various temperatures and pressures.Some of these data were collected by Zhang et al. 44 but this study added 325 new data to their collection.The BET surface, mesopore volume, micropore volume, temperature, pressure, and the amount of carbon dioxide adsorbed are presented in Table 2 acquiring a precise neural network model requires a large quantity of high-quality data.A standard for gathering data has been established in order to reduce the possibility of error caused by varying approaches to calculating the parameters influencing adsorption.The composition of the adsorbent contains either zero or a negligible amount of nitrogen.Nitrogen-doped carbon-based adsorbents demonstrate superior CO 2 adsorption capacity and heightened adsorption selectivity compared to their non-nitrogen counterparts.www.nature.com/scientificreports/This improvement can be attributed to the significant enhancement of base adsorption sites within these adsorbents due to the presence of nitrogen 66 .Therefore, to prevent errors, nitrogen-free sorbents were investigated.All specific surface areas were calculated using the BET equation from nitrogen adsorption at a temperature of 77 K.In this database, the total volume of the adsorbent was estimated from the absorption of liquid nitrogen at a relative pressure of 0.95-0.99.The Dubinin-Radushkevich (D-R) equation was used to obtain the volume of micropores.The volume of mesopores is calculated by subtracting the volume of micropores from the total volume of the adsorbent.The unit of BET surface area was square meters per gram (m 2 /g), the volume of micropores and mesopores was cubic centimeters per gram (cm 3 /g), the temperature was in degrees Celsius (C), the pressure was in bar, and the amount of carbon dioxide adsorption is based on millimoles of adsorbent per gram (mmol/g).The Pearson correlation coefficient matrix is the ratio of the covariance of each pair of adsorbent variables to the product of their standard deviations.Based on the Pearson correlation coefficient matrix (Fig. 1), there is no significant linear correlation between adsorbent textural properties and CO 2 uptake capacity.The correlation between adsorption capacity and the volume of mesopores and micropores (R = 0.017 and R = 0.147 respectively) indicates a weak relationship.Moreover, there was a strong positive correlation (R = 0.807) between micropore volume and BET surface.However, CO 2 adsorption capacity demonstrated a robust positive correlation with pressure (R = 0.776) and a moderately poor relationship with temperature (R = − 0.238).
In this study, 1345 data points were acquired, of which 1300 were used to develop the artificial neural network model.In addition, the 45 data were chosen randomly to predict carbon dioxide adsorption.MATLAB software arbitrarily separated 80% of the 1300 data points for training, 10% for validation, and another 10% for testing.Selecting the proper input and output is one of the important stages in creating a neural network model.According to what was stated in the introduction, previous research has demonstrated that in addition to operational parameters like temperature and pressure, the textural properties of the adsorbent, including the BET surface area, volume, especially the volume of mesopores, and the volume of micropores, significantly influence the adsorption process.As a result, the variables of BET surface, mesopore volume, micropore volume, temperature, and pressure are chosen as network input variables.The objective of creating a model is to predict the carbon dioxide adsorption capacity.Therefore, the amount of carbon dioxide adsorbed is deemed the network's output.www.nature.com/scientificreports/ To reduce the impact of parameters with greater magnitude on the ANN design, the entire database was standardized in the range of 0-1 (Supplementary Information).

Artificial neural networks
An artificial neural network (ANN) is a computational model that employs the architecture of the human brain to predict intricate and non-linear systems.Within the network's structure, artificial neurons are interconnected across the input, hidden, and output layers 90 .The neural network is exposed to input-output pairs and undergoes training to predict the output variables.The training process establishes the connection strengths among the processing neurons through the utilization of an appropriate training algorithm.The biases between the layers and the connectivity weights, thus, influence the input signals.An activation function is employed to modify the sum of these signals, aiming to minimize the disparity between the predicted output and the actual output data.The commonly utilized activation functions include purelin (Eq.2), logsig (Eq.3), and tansig (Eq.4).
Prominent types of ANNs include the radial basis function (RBF) and the multilayer perceptron (MLP) 91 .It is crucial to highlight the key distinction between these networks, which lies in the functioning of neurons.
The RBF-ANN architecture consists of an input layer, a hidden layer, and an output layer.The neurons within the hidden layer utilize radial basis functions as their activation functions.Through the utilization of a linear optimization strategy and the adjustment of weights during the minimization of mean square error, this algorithm can ascertain the optimal solution.
As mentioned earlier, the multilayer perceptron (MLP) is an alternative form of ANN.This algorithm comprises multiple layers, with the input layer being the first and the output layer being the last.Intermediate and hidden layers connect the input and output layers, where various forms of activation functions can be applied 92 .Additional information about MLP and RBF algorithms can be found in the literature 93,94 .

MLP training algorithms
The algorithm's learning process involves the forward propagation of data and the backward propagation of errors.Input data enters the model through the input layer without initial processing, undergoes initial processing in the hidden layer, and then gets to the output layer.If the difference between the network's predictions and the actual outputs does not meet the required level of accuracy, an error will be backpropagated through the network for further adjustment.
The backpropagation of errors works by propagating the difference between the network's output and the actual output back through the hidden layer to the input layer.The network's training procedure continues until the error between the network result and the actual output falls within the allowed tolerance or reaches a predetermined number of learning cycles 28 .
Furthermore, there are six distinct classes of backpropagation algorithms: adaptive momentum, self-adaptive learning rate, resilient backpropagation, conjugate gradient, Quasi-Newton, and Bayesian regularization 95 .

Adaptive momentum
The gradient descent (GD) algorithm is employed to determine an optimal set of internal variables for model optimization in machine learning and deep learning problems.Typically, gradient descent involves three steps: (1) initializing the internal variables, (2) evaluating the model using the internal variables and the loss function, and (3) updating the internal variables in a manner that moves toward optimal points.The gradient descent technique involves an iterative process as shown in Eq. ( 5).
In the given equation, w i represents the set of variables that need to be updated, ∇f(w i ) represents the gradient of the loss function f concerning the set of variables w i , and η denotes the learning rate.The value of η can be a constant or determined through a one-dimensional optimization along the training direction at each step.The primary objective of gradient descent is to locate the global minimum points that minimize or maximize the loss function, making this process essential in the optimization procedure for the loss function 96 .
Gradient descent with momentum backpropagation (GDM) is a training algorithm that utilizes batch steepest descent with an enhanced convergence rate.It incorporates momentum to adapt to trends in local gradients and error surfaces, thereby mitigating the risk of getting stuck in shallow local minimums.By employing momentum, GDM achieves faster convergence during the training process 97 .

Self-adaptive learning rate
The efficacy of the algorithm relies on the appropriate configuration of the learning rate.If the learning rate is set too high, it can result in instability, while setting it too low can lead to slow convergence.Determining the To enhance the performance of the gradient descent algorithm, an adaptive learning rate can be utilized, allowing for adjustments during training.The primary objective of an adaptive learning rate is to maintain a maximal learning step size while ensuring stability in the learning process 98 .
The conventional steepest descent (GD) backpropagation algorithm employs a fixed learning rate parameter during the network's training process.Nevertheless, the algorithm's performance significantly relies on the specific value assigned to this learning rate parameter.To address this, the gradient descent with an adaptive learning rate backpropagation (GDA) algorithm was created, enabling an adaptive adjustment of the learning rate parameter.This adaptive strategy strives to optimize the magnitude of each learning step while maintaining the stability of the learning process.In the GDA algorithm, the optimal value for the learning rate parameter varies depending on the trajectory of the gradient on the error 97 .The training algorithm referred to as gradient descent with momentum and adaptive learning rate backpropagation (GDX) integrates both adaptive learning rate and momentum training principles.It is similar to the GDA algorithm, but with the addition of a momentum coefficient as a training parameter.Consequently, the weight vector is updated using a similar approach as in GDM, while incorporating a variable learning rate as seen in GDA.

Resilient backpropagation (RP)
Resilient backpropagation (RP) is typically applied to eliminate the negative consequences of partial derivative values.This algorithm has the advantage of being significantly quicker than the standard reduction algorithm 95 .In the hidden layers of multilayer networks, sigmoid activation functions are typically utilized to restrict the output range.As the input value increases, the slope of the Sigmoid functions tends to approach zero.This poses a challenge when training a multilayer network using gradient descent and Sigmoid functions, as the gradients can become exceedingly small, resulting in weight and bias updates that deviate significantly from the optimal values.The resilient backpropagation training algorithm aims to mitigate the adverse impact of small partial derivatives.In this approach, only the sign of the derivative is utilized to determine the weight update direction, while the derivative's actual value does not affect the weight update.The magnitude of the weight change is determined by a distinct update value 98 .

Conjugate gradient
The conjugate gradient algorithm, which combines elements of gradient descent and Newton's method, enhances the convergence rate of artificial neural networks by eliminating the requirement to measure, store, and invert the Hessian matrix.It explores conjugate directions in a coordinated manner, leading to faster convergence compared to the directions followed by gradient descent.The algorithm establishes the sequence of training directions using the equation provided below.
Utilizing the primary training direction vector where y represents the training direction vector, c denotes the conjugate parameter, and i is set as the negation of the gradient in all scenarios 96 .The conjugate gradient algorithm's parameter improvement procedure is defined by where η i is the learning rate, which is determined by line minimization normally.
The standard backpropagation algorithm modifies weights in the direction of the steepest descent, but this does not guarantee the quickest convergence.Conjugate gradient algorithms expedite convergence by exploring conjugate directions.The initial iterations involve performing the steepest descent, conducting line searches, and combining the direction with the previous search direction.The determination of the new search direction depends on a constant value, which is calculated in the Fletcher-Reeves update[conjugate gradient backpropagation with Fletcher-Reeves update (CGF) algorithm] as the difference between the squared norm of the current gradient and the squared norm of the previous gradient 99 .The constant employed to determine the updated search direction in the Polak-Ribiére update, as part of the conjugate gradient backpropagation with the Polak-Ribiére update (CGP) algorithm, is calculated as the inner product of the previous gradient change and the current gradient, divided by the squared norm of the previous gradient.In contrast to the Fletcher-Reeves method, which involves three vectors, the Polak-Ribiére update requires a marginally higher storage capacity 97 .
In every conjugate gradient algorithm, the search direction is regularly reset to the inverse of the gradient.While other reset techniques can enhance training effectiveness, the typical reset point occurs when the number of iterations matches the number of network parameters (weights and biases).The Powell-Beale restarts [conjugate gradient backpropagation with Powell-Beale restarts (CGB) algorithm] is an example of such a reset technique.Powell introduced a restart strategy for enhancing training effectiveness, building upon a suggestion from Bill.This strategy triggers a restart when there is limited orthogonality between the current gradient and the previous gradient.Unlike Polak-Ribiére, the Powell-Beale algorithm demands slightly greater storage capacity 97 .
The three previously discussed conjugate gradient techniques require a line search after each iteration, which can be computationally expensive due to the need to compute the network output for all training inputs multiple times.To address this issue and significantly reduce the number of calculations required per iteration, the scaled (6) conjugate gradient backpropagation (SCG) training technique was developed.However, SCG may require more iterations than the other conjugate gradient algorithms to achieve convergence.The storage requirements of the SCG algorithm are comparable to those of the CGF algorithm.In the majority of problems, SCG yields a superlinear convergent.It is at least an order of magnitude quicker than the backpropagation algorithm in terms of performance.Using a mechanism for resizing the step size, SCG avoids a lengthy search by row for learning iterations, making the algorithm speedier than other recently suggested second-order algorithms 97 .

Quasi-Newton
Quasi-Newton methods, a subset of variable metric techniques, are employed to identify local extremum points of functions.These methods draw their inspiration from Newton's method, designed to pinpoint stationary points of a function where the gradient equals zero.Newton's method assumes that the function can be locally approximated as a quadratic function in the vicinity of the optimal point.To accomplish this, it relies on the utilization of both the first and second derivatives of the function.In cases involving higher dimensions, Newton's method extends its application by incorporating the gradient and the Hessian matrix, which encapsulates the second derivatives of the function, with the objective of function minimization 100 .
Newton's method presents an alternative to conjugate gradient methods, known for its rapid convergence and optimization capabilities.It involves the computation of the Hessian Matrix, which leads to faster convergence compared to conjugate gradient methods.However, calculating the Hessian matrix for feedforward neural networks is challenging and computationally expensive.The BFGS Quasi-Newton backpropagation (BFGS) algorithm is well-suited for smaller networks, although it requires more storage and computational resources due to its complexity and cost 95 .
The one step secant backpropagation (OSS) training technique strikes a balance between conjugate gradient algorithms and full quasi-Newton algorithms.It offers reduced storage and computational requirements per iteration by computing the Hessian matrix only once per epoch and retaining it throughout the iteration.This approach determines the new search direction without explicitly calculating the inverse matrix.However, it necessitates additional computational and storage resources per iteration compared to conjugate gradient methods 97,99 Levenberg-Marquardt backpropagation (LM).In many problems, the Levenberg-Marquardt (LM) algorithm outperforms standard gradient descent and many other conjugate gradient methods.LM is a combination of the local search features of Gauss-Newton and the error reduction consistency afforded by the gradient descent algorithm.The feedforward network training based on LM is considered an unconstrained optimization issue.The Levenberg-Marquardt algorithm was developed to approximate the training speed of second-order methods without explicitly computing the Hessian matrix.In the case where the performance function of feedforward networks can be expressed as a sum of squares, the Hessian matrix can be estimated using the following approximation 101 : The calculation of the gradient can be expressed in the following manner: In this context, the Jacobian matrix denoted as J comprises the first derivatives of network errors concerning weights and biases, while e represents the vector of network errors.The Jacobian matrix can be obtained using a standard back-propagation technique, which is notably less intricate than the computation of the Hessian matrix.The Levenberg-Marquardt algorithm utilizes this approximation of the Hessian matrix in the subsequent Newton-like update iteration 98 .
When the value of scalar μ equals zero, it corresponds to employing an approximation of the Hessian matrix in Newton's method.This results in a gradient descent with shorter steps when μ is large.Newton's method exhibits faster and more accurate convergence near an error minimum; thus, the goal is to transition to Newton's method as early as possible.Accordingly, μ is decreased after each successful step (improvement in the performance function) and only increased when a tentative step leads to an increase in the performance function.The performance function consistently decreases with each iteration in the algorithm by following this approach 98 .

Bayesian regularization backpropagation (BR)
The conventional Backpropagation (BP) algorithm can encounter the problem of overfitting, which manifests as reduced bias and increased variance.Conversely, the Bayesian regularization of artificial neural networks (BRANN) exhibits superior generalization capabilities.The BRANN minimizes the objective function F, which combines the mean squared error function E D and the weight attenuation function E W .It probabilistically determines the optimal weights and parameters of the objective function.The objective function of the BRANN can be represented as 102 : In the given equation, α and β are hyper-parameters utilized to control the distribution of other parameters.w represents the weights, while m denotes the number of these weights.D refers to the training set data, represented as (x i , t i ), where i ranges from 1 to N, indicating the total number of input-output pairs in the training set.y i represents the output value corresponding to the i-th input-output pair in the training set 102 .The ANN model should produce nearly identical error rates for training and test data.Regularization is a technique that forces a neural network to converge to a set of weights and biases with reduced values.This makes the network's response more consistent and reduces the likelihood of data overfitting.

Development of optimal ANN structure
The process of constructing MLP and RBF neural networks is demonstrated in a step-by-step manner in Figs. 2  and 3, respectively.This process typically involves determining the network's architecture, training the network, and evaluating its performance.A trial-and-error methodology was employed to identify the optimal structure for the artificial neural network (ANN) 65 .The optimal ANN structure was subsequently determined based on the highest value of the low value of mean square error (MSE) (Eq.15), Pearson's linear correlation coefficient (R) (Eq.16), and the average absolute relative deviation percentage (AARD%) (Eq.17) 103 .( 13) Here α exp represents the data points obtained from experimental measurements, while α cal represents the corresponding data points calculated by the models.N denotes the total number of data points.
In prior studies 103,104 , researchers initially examined a predetermined network topology and explored several training algorithms on that specific network structure to identify the most optimum network configuration.After identifying the most effective learning algorithm, the researchers proceeded to determine the optimal network configuration by manipulating the number of neurons and layers inside the network.This would result in the loss of an essential part of the search space, namely the existence of a network with better performance but a different structure compared to the predetermined network structure for the trained network with a determined training algorithm.Notably, this approach highlights the importance of considering a broader range of network architectures to potentially discover superior configurations that may have been missed in the initial exploration.
In this study, the objective of the proposed method was to highlight the impact of the learning algorithm in determining the optimal network configuration.To achieve this, the development of optimal models for each training algorithm was initiated.Subsequently, an assessment was carried out to determine the most suitable network architecture and the optimal selection of activation functions for each model.This sequential approach enabled the systematic exploration of the impact of training algorithms on the Recognition of ideal network configurations within our specific domain of study.Thirteen different backpropagation training algorithms have been applied to train MLP neural networks.With the desire to determine the optimal neural network architecture for each training algorithm, two concealed layers of neurons ranging from zero to 50 are considered.Comparing four distinct combinations of logsig and tansig functions in the hidden layers of each algorithm's optimal architecture led to the detection of suitable activation functions.The initial assignment of weights and biases was randomly performed using MATLAB software.It is important to acknowledge that, to mitigate the impact of initial weight and bias assumptions on the outcomes, each MLP topology was executed at least three times, with only the most optimal result considered.This approach was employed to propose a model that exhibits enhanced and more precise performance, accounting for variations in the training process.
Similar to the MLP neural network, there are no specific rules for determining the optimal architecture of the RBF network.In the case of the RBF network, the number of hidden layers remains constant at one.The sole parameter of the network structure that requires determination is the number of neurons in the hidden layer.This parameter is established through a process of trial and error.In our research, we employ a Gaussian function with a radial basis activation function, and the spread value is dependent on the desired Gaussian function.

Best MLP model
Several structures of MLP were investigated for each training algorithm to determine the optimal ANN for predicting CO 2 adsorption capacity. Figure 4 displays the best Mean Square Error (MSE) value achieved for each topology.The analysis of Fig. 4 suggests that a more complex neural network architecture with multiple In stark contrast, the GDM (gradient descent with momentum) algorithm performs less accurately, registering the lowest accuracy levels among the algorithms under examination.The superior performance of the LM algorithm in terms of accuracy can be attributed to its adaptability, use of second-order information, and efficient optimization in complex landscapes.Conversely, the GDM algorithm's  lower accuracy may result from its reliance on fixed learning rates, sensitivity to initialization, and a greater tendency to get stuck in local minima.
The SCG algorithm, with a remarkably short runtime of 0.7640 s, stands out as the most time-efficient method for training neural networks in this study.In contrast, the BFG algorithm exhibits the longest training time, consuming 56.6110 s to complete the network training process.This substantial difference in runtime highlights the significant disparity in computational efficiency between these two optimization algorithms.the difference in training time between SCG and BFG likely arises from a combination of algorithmic differences, problemspecific factors, and the chosen settings or hyperparameters.
The LM training algorithm demonstrates efficient convergence within a relatively small number of epochs, specifically, 82 epochs.In contrast, both the GDM and GD training algorithms have reached the predefined maximum number of epochs, set at 5000 epochs, without achieving the desired convergence.This disparity in the number of epochs required for convergence underscores the distinct convergence behaviors of these algorithms.The LM algorithm's ability to achieve convergence within a limited number of epochs suggests its effectiveness in optimizing neural networks, while the protracted training process observed in GDM and GD may indicate challenges in navigating the optimization landscape.
Figure 5 presents a comprehensive comparison of neural networks trained with various training algorithms, considering performance accuracy, run time, and the number of epochs.This comparison provides valuable insights into the trade-offs between these critical aspects of algorithm performance.
As previously stated, the ANN trained with the LM backpropagation algorithm was chosen as the most effective training method due to its low mean square error (MSE < 2.6293E−05) and high correlation coefficient (R > 0.9951).Therefore, it is chosen as the optimal training algorithm to build the MLP neural network model for simulating and predicting carbon dioxide adsorption.Figure 6 depicts the structure of the optimal MLP network obtained.
Figure 7 depicts the variation in mean square error as a function of the number of data application steps, where the optimal MLP model displays the best validation performance (3.6342E−05) at 72 epochs.In addition, the error histogram illustrates the operation of the neural network in Fig. 8.By comparing the collected experimental data with the data modeled by the MLP neural network in Fig. 9, it is evident that the experimental data and predicted data are highly congruent.In addition, experimental values are always associated with some error, necessitating the use of data with less error for a more accurate network.However, the R correlation coefficients for network training, validation, test, and total, were obtained as 0.99614, 0.99441, 0.99142, and 0.99512, emphasizing the reliability and accuracy of the chosen neural network model.This demonstrates the consistency model's performance across different data subsets and reinforces the robustness of the findings in this study.Therefore, neural networks are appropriate for modeling the CO 2 adsorption on carbon-based adsorbents.

Best RBF model
As previously emphasized, it is essential to ascertain the optimal value for the spread parameter within the radial basis function for the RBF neural network.As evidenced by the data presented in Table 5, it becomes apparent that a spread value of 10 corresponds to the lowest observed mean squared error (MSE).The hidden layer encompasses a notable 302 neurons, signifying a relatively large quantity compared to other network configurations.However, it is noteworthy that this larger neuron count does not yield a substantially different mean squared error (MSE) value.Conversely, in the network with a spread parameter set at 9, a slight increment in MSE is observed.Nevertheless, this configuration is accompanied by a reduced number of neurons in the hidden layer, totaling 207.This reduction not only leads to diminished computational time but also translates into lower computational costs.Furthermore, it is worth noting that this particular network does not feature an excessively large or excessively small number of neurons within its hidden layer in comparison to alternative configurations.Additionally, its performance accuracy is notably high, making it the preferred choice as the optimal model for www.nature.com/scientificreports/  the RBF network.This selection is visually represented in Fig. 10, where the chosen RBF network configuration is depicted.The change in mean square error is displayed in Fig. 11, in which the optimal RBF model with 207 neurons in the hidden layer exhibits the best performance (9.8402E−05).In addition, a moderately good agreement between the RBF output values and the experimental data is observable in the regression diagram of Fig. 12, with the value of R equal to 0.98145.

Prediction of CO 2 adsorption with new data
In order to evaluate the efficacy of the created neural network models, the obtained MLP and RBF models are performed with 45 new data (which were initially separated from the data set), and the predicted CO 2 adsorbed is compared to the experimental values.The results of CO 2 adsorption prediction by MLP neural network models with various training algorithms are displayed in Table 6.The LM algorithm demonstrates the highest accuracy among all models, evidenced by the lowest AARD% value of 2.80 and the highest correlation coefficient of 0.9993.Additionally, the BR algorithm yields commendable results, with an AARD% of 4.27 and a correlation coefficient of 0.9988.The outcomes of RBF neural network models with varying spread values in predicting the quantity of CO 2 adsorption are presented in Table 7.It is evident that the model achieving the lowest AARD% value, standing at 13.41% and associated with a dispersion of 9, attains the highest level of accuracy among all the models.For visual representation, Fig. 13 illustrates the linear regression between the predicted values for CO 2 adsorption and the neural network outputs, considering both MLP and RBF models, using new data.

Comparing MLP and RBF
Modeling and simulation of carbon dioxide adsorption on carbon base adsorbents by neural networks revealed that the MLP network (with the LM training algorithm) complies with experimental values more closely than  RBF.Table 8 provides the MSE value and correlation coefficient derived from simulation and prediction with new data for both networks.The MLP deep neural network is more appropriate for modeling and simulating this process than the RBF network due to its higher correlation coefficient and lower mean square error values.As previously stated, the relation of Durá et al. 45 is presented to predict the quantity of carbon dioxide adsorbed by micropore and mesopore volume.This model predicts the amount of carbon dioxide adsorbed on 12 distinct adsorbers using a square correlation coefficient of 0.9829.With a correlation coefficient of 0.9951 for more than 200 adsorbers at varying temperatures and pressures, it is evident that the MLP deep network model obtained through this study is more accurate and efficient.www.nature.com/scientificreports/ a notable upward trend in adsorption with increasing pressure, aligning with findings observed in pertinent studies 24,105 .Conversely, with an increase in temperature to 120 °C, a slight reduction in the adsorption becomes apparent.This decrease can be attributed to the exothermic nature of the adsorption process, whereby the concentration of adsorbed gas on the adsorber's surface diminishes as temperature levels rise 24,40 .According to the data presented in Fig. 14, the highest levels of CO 2 adsorption are observed within the pressure range of 30-50 and the temperature range of 0-20.The carbon dioxide adsorption characteristics at 25 °C, a BET surface area of 500 square meters per gram, and pressures of 1, 5, 15, and 20 bar are presented in Fig. 15, with a focus on the role of mesopores and micropores.At 1 bar pressure, the influence of micropore volume in the range of 0.6-1.2cm 3 /g on carbon dioxide adsorption is predominantly observed in Fig. 15a.This trend is sustained up to 5 bar, where a significant role for micropores is depicted in Fig. 15b.However, as pressures increase to 15 and 20 bars, the prominence of mesopore volume becomes more evident, as observed in Fig. 15c,d.Particularly at 20 bar, substantial growth in the quantity of adsorption within the mesopore volume range of 4-8 cm 3 /g, is exhibited.These findings are aligned with prior research 37,40,42,43 , which suggests that carbon dioxide adsorption is primarily governed by micropore volume at lower pressures and mesopore volume at higher pressures.This shift may be attributed to the saturation of micropores at higher pressures, necessitating the contribution of mesopores to achieve higher CO 2 uptake 43 .Figure 16 presents the depiction of carbon dioxide adsorption onto the adsorbent, examining its dependency on BET surface area, temperature, and pressure.Figure 16a elucidates the outcomes under a fixed pressure condition of 5 bar while concurrently noting micropore and mesopore volumes of 0.53 and 0.75, respectively.Within this parameter range, the study observed that the maximum CO 2 adsorption occurred at lower temperatures ranging from 0 to 60 °C, along with a higher BET surface area ranging from 2000 to over 3500 m 3 /g.Generally, a diminishing trend in carbon dioxide adsorption was noted as temperature increased.Conversely, as the BET   surface area approached 2000, a notable increase in adsorption was recorded, followed by a modest decline, although it remained elevated.In Fig. 16b, the findings are presented at a consistent temperature of 25 °Celsius, with micropore and mesopore volumes set at 0.75 and 4.5, respectively.Within this context, the study identified the peak adsorption occurring within a specified pressure range of 30-50 bars, in conjunction with a BET surface area ranging from 1000 to 2500 m 3 /g.Overall, it was observed that carbon dioxide adsorption exhibited a positive correlation with both increasing pressure and BET surface area.Nevertheless, it is important to note that the observed increase in surface area did not consistently result in a simultaneous increase in adsorption across all pressure levels.This observation suggests that a substantial specific surface area indeed enhances the adsorption capacity of CO 2 but within a specific range of CO 2 pressures 37 .

Evaluation of adsorption factors
Figure 17 delineates the influence of BET surface area, mesopore volume, and micropore volume on CO 2 adsorption.Within Fig. 17a, a discernible trend emerges, wherein carbon dioxide adsorption exhibits an ascending pattern in response to elevated BET surface area and mesopore volume values.This behavior is observed under specific conditions, including a micropore volume of 0.45 cm 3 /g, a temperature of 25 °C, and a pressure of 5 bar.Notably, the zenith of carbon dioxide adsorption manifests within a designated range, observed within 1 to 7 cm 3 /g for mesopore volume and 2500-3000 m 2 /g for BET surface area.In Fig. 17b, conducted at 25 °C and 15 bar, with a mesopore volume of cm 3 /g, the paramount point of carbon dioxide adsorption is situated within the domain defined by micropore volume values ranging from 0.4 to 0.8 and BET surface area values spanning from 3000 to 3700 m 2 /g.Furthermore, a substantial quantity of adsorption is discernible within the span characterized by micropore volume from 1 to 1.4 and BET surface area from 1500  to 3500 m 2 /g.These findings signify the distinct influence exerted by micropore volume at lower BET surface area values and the accentuated impact of BET surface area at reduced micropore volumes.
It is imperative to underscore the formidable challenge posed by the synthesis of porous carbon materials concurrently possessing high BET surface areas (indicative of substantial micropore volume) and low micropore volumes (characterized by extensive BET surface areas), as noted in previous research 44 .

Conclusion
This study successfully modeled carbon dioxide adsorption on carbon-based adsorbents using multilayer perceptron (MLP) and radial basis function (RBF) neural networks.Input variables such as BET surface, mesopore volume, micropore volume, temperature, and pressure were used in the models.After evaluating various training algorithms and activation functions, the Levenberg-Marquardt backpropagation algorithm with 'tansig' activation in hidden layers and linear output was identified as the optimal configuration for MLP models.The best MLP and RBF models achieved mean square error (MSE) values of 2.6293E−5 and 9.8401E−5, respectively.The MLP deep neural network with LM and BR training algorithms outperformed the RBF network, achieving a remarkable correlation coefficient of 0.9951 across a dataset of over 200 adsorbers.This study also revealed the significant influence of micropore volume at lower pressures and mesopore volume at higher pressures on CO 2 uptake.The study has significantly contributed to the development of a comprehensive and efficient model for predicting carbon dioxide adsorption, leveraging prior research to establish a robust connection between the textural properties of adsorbents and operational conditions.This advancement enhances the ability to predict porous carbon CO 2 uptake effectively.

Figure 1 .
Figure 1.Pearson correlation matrix between any two variables of porous carbon adsorbents and CO 2 adsorption capacity based on the total database.

2 Figure 2 .
Figure 2. Schematic view of the MLP creation steps.

Figure 3 .
Figure 3. Schematic view of the RBF creation steps.

Figure 5 .
Figure 5. Comparing the efficacy of neural networks trained with various training algorithms with respect to: (a) accuracy, (b) run time, (c) number of epochs.

Figure 6 .
Figure 6.The structure of the optimal MLP network (trained with the LM algorithm).

Figure 7 .
Figure 7. MSE by number of epochs for data sets n MLP network.

Figure 8 .
Figure 8. Error histogram plot for MLP network data sets.

Figure 14 Figure 9 .
Figure14exhibits a three-dimensional graphical representation depicting the relationship between carbon dioxide adsorption, temperature, and pressure.This depiction assumes that the volumes of mesopores, micropores, and the BET surface remain constant, set at values of 0.75, 0.53, and 1510, respectively.The graph illustrates

Figure 10 .
Figure 10.The structure of the optimal RBF network.

Figure 11 .
Figure 11.Variations in the MSE value of the RBF neural network based on the number of epochs.

Figure 12 .
Figure 12.Linear regression between experimental data and RBF outputs.

Figure 16 .
Figure 16.3D plots of CO 2 adsorption based on (a) temperature and BET surface, (b) pressure and BET surface for MLP trained with the LM training algorithm.

Figure 17 .
Figure 17.3D plots of CO 2 adsorption based on (a) mesopore volume and BET surface, (b) micropore volume and BET surface for MLP trained with the LM training algorithm.

Table 1 .
Some studies carried out in the application of neural networks on CO 2 adsorption.

Table 2 .
The range of data employed in this study.

Table 3 .
The outcomes of operating neural networks with various activation function combinations.

Table 4 .
The results of implementing networks with diverse algorithms and optimal architectures.

Table 5 .
The MSE values for the spread range of 3 to 12 in the RBF network.

Table 6 .
Prediction of CO 2 uptake by MLP neural network models with distinct training algorithms.

Table 7 .
Prediction of CO 2 uptake by RBF neural network models with distinct training algorithms.